Search CORE

86 research outputs found

Recommended from our members

Multiple-instrument polyphonic music transcription using a temporally constrained shift-invariant model

Author: Bay M.
Benetos E.
Benetos E.
Benetos E.
de Cheveigné A.
Dempster A. P.
Dessein A.
Dixon S.
Emmanouil Benetos
Fuentes B.
Goto M.
Lee C.-T.
Nakano M.
Nakano M.
Pertusa A.
Poliner G.
Ryynänen M.
Simon Dixon
Smaragdis P.
Smaragdis P.
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/03/2013
Field of study

A method for automatic transcription of polyphonic music is proposed in this work that models the temporal evolution of musical tones. The model extends the shift-invariant probabilistic latent component analysis method by supporting the use of spectral templates that correspond to sound states such as attack, sustain, and decay. The order of these templates is controlled using hidden Markov model-based temporal constraints. In addition, the model can exploit multiple templates per pitch and instrument source. The shift-invariant aspect of the model makes it suitable for music signals that exhibit frequency modulations or tuning changes. Pitch-wise hidden Markov models are also utilized in a postprocessing step for note tracking. For training, sound state templates were extracted for various orchestral instruments using isolated note samples. The proposed transcription system was tested on multiple-instrument recordings from various datasets. Experimental results show that the proposed model is superior to a non-temporally constrained model and also outperforms various state-of-the-art transcription systems for the same experiment

City Research Online

Crossref

Audio Source Separation with Discriminative Scattering Networks

Author: C Févotte
DD Lee
E Vincent
J Bruna
J Han
J Mairal
P Smaragdis
S Mallat
Publication venue
Publication date: 27/04/2015
Field of study

In this report we describe an ongoing line of research for solving single-channel source separation problems. Many monaural signal decomposition techniques proposed in the literature operate on a feature space consisting of a time-frequency representation of the input data. A challenge faced by these approaches is to effectively exploit the temporal dependencies of the signals at scales larger than the duration of a time-frame. In this work we propose to tackle this problem by modeling the signals using a time-frequency representation with multiple temporal resolutions. The proposed representation consists of a pyramid of wavelet scattering operators, which generalizes Constant Q Transforms (CQT) with extra layers of convolution and complex modulus. We first show that learning standard models with this multi-resolution setting improves source separation results over fixed-resolution methods. As study case, we use Non-Negative Matrix Factorizations (NMF) that has been widely considered in many audio application. Then, we investigate the inclusion of the proposed multi-resolution setting into a discriminative training regime. We discuss several alternatives using different deep neural network architectures

arXiv.org e-Print Archive

Crossref

BLUES from Music: BLind Underdetermined Extraction of Sources from Music

Author: A. Hyvärinen
A.S. Bregman
D. Kolossa
D.L. Wang
D.L. Wang
G. Hu
M.D. Plumbley
N. Roman
O. Yilmaz
P. Smaragdis
Publication venue: Springer Berlin / Heidelberg
Publication date: 01/01/2006
Field of study

In this paper we propose to use an instantaneous ICA method (BLUES) to separate the instruments in a real music stereo recording. We combine two strong separation techniques to segregate instruments from a mixture: ICA and binary time-frequency masking. By combining the methods, we are able to make use of the fact that the sources are differently distributed in both space, time and frequency. Our method is able to segregate an arbitrary number of instruments and the segregated sources are maintained as stereo signals. We have evaluated our method on real stereo recordings, and we can segregate instruments which are spatially different from other instruments

CiteSeerX

Crossref

Online Research Database In Technology

Recommended from our members

A Shift-Invariant Latent Variable Model for Automatic Music Transcription

Author: Bay M.
Benetos E.
Dessein A.
Emmanouil Benetos
Fuentes B.
Goto M.
Grindlay G.
Mysore G.
Poliner G.
Schörkhuber C.
Simon Dixon
Smaragdis P.
Publication venue: 'MIT Press - Journals'
Publication date: 01/12/2012
Field of study

In this work, a probabilistic model for multiple-instrument automatic music transcription is proposed. The model extends the shift-invariant probabilistic latent component analysis method, which is used for spectrogram factorization. Proposed extensions support the use of multiple spectral templates per pitch and per instrument source, as well as a time-varying pitch contribution for each source. Thus, this method can effectively be used for multiple-instrument automatic transcription. In addition, the shift-invariant aspect of the method can be exploited for detecting tuning changes and frequency modulations, as well as for visualizing pitch content. For note tracking and smoothing, pitch-wise hidden Markov models are used. For training, pitch templates from eight orchestral instruments were extracted, covering their complete note range. The transcription system was tested on multiple-instrument polyphonic recordings from the RWC database, a Disklavier data set, and the MIREX 2007 multi-F0 data set. Results demonstrate that the proposed method outperforms leading approaches from the transcription literature, using several error metrics

City Research Online

Crossref

Recommended from our members

Editorial: Special Section on Statistical and Perceptual Audio Processing

Author: Brown Judith C.
Ellis Daniel P. W.
Raj Bhiksha
Slaney Malcolm
Smaragdis Paris
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2006
Field of study

Human perception has always been an inspiration for automatic processing systems, not least because tasks such as speech recognition only exist because people do them—and, indeed, without that example we might wonder if they were possible at all. As computational power grows, we have increasing opportunities to model and duplicate perceptual abilities with greater fidelity, and, most importantly, based on larger and larger amounts of raw data describing both what signals exist in the real world, and how people respond to them. The power to deal with large data sets has meant that approaches that were once mere theoretical possibilities, such as exhaustive search of exponentially-sized codebooks, or real-time direct convolution of long sequences, have become increasingly practical and even unremarkable. A major consequence of this is the growth of statistical or corpus-based approaches, where complex relations, discriminations, or structures are inferred directly from example data (for instance by optimizing the parameters of a very general algorithm). An increasing number of complex tasks can be given empirically optimal solutions based on large, representative datasets. The traditional idea of perceptually-inspired processing is to develop a machine algorithm for a complex task such as melody recognition or source separation through inspiration and introspection about how individuals perform the task, and on the basis of direct psychological or neurophysiological data. The results can appear to be at odds with the statistical perspective, since perceptually-motivated work is often ad-hoc, comprising many stages whose individual contributions are difficult to separate. We believe that it is important to unify these two approaches: to employ rigorous, exhaustive techniques taking advantage of the statistics of large data sets to develop and solve perceptually-based and subjectively-defined problems. With this in mind, we organized a one-day workshop on Statistical and Perceptual Audio Processing as a satellite to the International Conference on Spoken Language Processing (ICSLP-INTERSPEECH), held in Jeju, Korea, in September 2004

Columbia University Academic Commons

BLUES from Music: BLind Underdetermined Extraction of Sources from Music

Author: A. Hyvärinen
A.S. Bregman
D. Kolossa
D.L. Wang
D.L. Wang
G. Hu
M.D. Plumbley
N. Roman
O. Yilmaz
P. Smaragdis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Crossref

Notes on nonnegative tensor factorization of the spectrogram for audio source separation : statistical insights and towards self-clustering of the spatial cues

Author: A. Ozerov
A. Shashua
C. Févotte
D.D. Lee
E. Vincent
E. Vincent
F.D. Neeser
L.A. Shepp
P. Smaragdis
R.M. Parry
T. Virtanen
Y. Cao
Publication venue: HAL CCSD
Publication date: 21/10/2010
Field of study

International audienceNonnegative tensor factorization (NTF) of multichannel spectrograms under PARAFAC structure has recently been proposed by Fitzgerald et al as a mean of performing blind source separation (BSS) of multichannel audio data. In this paper we investigate the statistical source models implied by this approach. We show that it implicitly assumes a nonpoint-source model contrasting with usual BSS assumptions and we clarify the links between the measure of fit chosen for the NTF and the implied statistical distribution of the sources. While the original approach of Fitzgeral et al requires a posterior clustering of the spatial cues to group the NTF components into sources, we discuss means of performing the clustering within the factorization. In the results section we test the impact of the simplifying nonpoint-source assumption on underdetermined linear instantaneous mixtures of musical sources and discuss the limits of the approach for such mixtures

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification

Author: A. Webb
C. Fevotte
D.D. Lee
G.J. Brown
K. Fukunage
M.R. Every
M.S. Pedersen
P. Smaragdis
P.A. Devijver
T. Virtanen
W. Wang
Y. Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Separating multiple music sources from a single channel mixture is a challenging problem. We present a new approach to this problem based on non-negative matrix factorization (NMF) and note classification, assuming that the instruments used to play the sound signals are known a priori. The spectrogram of the mixture signal is first decomposed into building components (musical notes) using an NMF algorithm. The Mel frequency cepstrum coefficients (MFCCs) of both the decomposed components and the signals in the training dataset are extracted. The mean squared errors (MSEs) between the MFCC feature space of the decomposed music component and those of the training signals are used as the similarity measures for the decomposed music notes. The notes are then labelled to the corresponding type of instruments by the K nearest neighbors (K-NN) classification algorithm based on the MSEs. Finally, the source signals are reconstructed from the classified notes and the weighting matrices obtained from the NMF algorithm. Simulations are provided to show the performance of the proposed system. © 2011 Springer-Verlag Berlin Heidelberg

Crossref

University of Surrey

Surrey Research Insight

An iterative model-based approach to cochannel speech separation

Author: A Narayanan
A Nádas
A Reddy
AP Varga
CH Taal
DeLiang Wang
DL Wang
G Hu
G Hu
G Kim
GJ Mysore
J Barker
JR Hershey
K Hu
Ke Hu
M Cooke
MH Radfar
MH Radfar
MH Radfar
P Mowlaee
P Mowlaee
P Mowlaee
P Smaragdis
R Saeidi
R Weiss
S Rennie
S Roweis
Y Shao
Y Shao
Y Shao
YT Yeung
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Automatic transcription of Turkish microtonal music

Author: Anderson Sutton R.
André Holzapfel
Arel H. S.
Bay M.
Benetos E.
Benetos E.
de Cheveigné A.
Dempster A. P.
Dessein A.
Emmanouil Benetos
Erkut C.
Karaosmanoğlu K.
Lee K.
Macrae R.
Nesbit A.
Smaragdis P.
Stock J. P. J.
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/2015
Field of study

Automatic music transcription, a central topic in music signal analysis, is typically limited to equal-tempered music and evaluated on a quartertone tolerance level. A system is proposed to automatically transcribe microtonal and heterophonic music as applied to the makam music of Turkey. Specific traits of this music that deviate from properties targeted by current transcription tools are discussed, and a collection of instrumental and vocal recordings is compiled, along with aligned microtonal reference pitch annotations. An existing multi-pitch detection algorithm is adapted for transcribing music with 20 cent resolution, and a method for converting a multi-pitch heterophonic output into a single melodic line is proposed. Evaluation metrics for transcribing microtonal music are applied, which use various levels of tolerance for inaccuracies with respect to frequency and time. Results show that the system is able to transcribe microtonal instrumental music at 20 cent resolution with an F-measure of 56.7%, outperforming state-of-the-art methods for the same task. Case studies on transcribed recordings are provided, to demonstrate the shortcomings and the strengths of the proposed method.QC 20161031</p

Publikationer från KTH

Crossref

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Queen Mary Research Online